This is an data analysis of some important KPIs in Cerro Matoso, a Colombian Nickel Mine. The data set has 11 variables and 10006 observation, that are operational variables over a 1 year of records in 2 shift, day and night. The equipment are truck CAT777 operated by humans.
The data set consists in 11 variables and 10006 observations. following the dictionary is listed.
knitr::kable(cerro_matoso_1[1:5, ])
| fecha | Turno | Operativo | Equipo | Mant Preventiva | Mant Planeada | Mant Correctiva | Over Haul | Demora Prog | Demora Noprog | DemoraStandBy |
|---|---|---|---|---|---|---|---|---|---|---|
| 2017-07-01 | Dia | 0.000000 | M3-32 | 0 | 12.000000 | 0 | 0 | 0.000000 | 0.000000 | 0 |
| 2017-07-01 | Dia | 10.002222 | M3-33 | 0 | 0.000000 | 0 | 0 | 1.450555 | 0.547222 | 0 |
| 2017-07-01 | Dia | 1.150555 | M3-34 | 0 | 10.468888 | 0 | 0 | 0.380555 | 0.000000 | 0 |
| 2017-07-01 | Dia | 10.035277 | M3-35 | 0 | 0.000000 | 0 | 0 | 1.964722 | 0.000000 | 0 |
| 2017-07-01 | Dia | 7.686111 | M3-37 | 0 | 1.713055 | 0 | 0 | 1.607222 | 0.993611 | 0 |
This is a wrangling of the cerro_matoso_1. You may see the operation in following script.
cerro_matoso_group<-cerro_matoso_1%>%
group_by(fecha, Turno)%>%summarize(
Operativo=sum(Operativo),
preventiva=sum(`Mant Preventiva`),
planeada=sum(`Mant Planeada`),
correctiva=sum(`Mant Correctiva`),
overall=sum(`Over Haul`),
programada=sum(`Demora Prog `),
noprogramada=sum(`Demora Noprog `),
stanby=sum(DemoraStandBy))%>%
mutate(Tcalendario=(Operativo+preventiva+planeada+correctiva+overall+programada+noprogramada),
disponibilidad=(Tcalendario-preventiva-correctiva-planeada-overall)/Tcalendario,
utilizacion=Operativo/(Tcalendario-preventiva-correctiva-planeada-overall))
knitr::kable(cerro_matoso_group[1:5, ])
| fecha | Turno | Operativo | preventiva | planeada | correctiva | overall | programada | noprogramada | stanby | Tcalendario | disponibilidad | utilizacion |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2017-07-01 | Dia | 96.93527 | 12 | 24.181943 | 0.000000 | 0.000000 | 17.64361 | 5.239165 | 0 | 156 | 0.7680644 | 0.8090207 |
| 2017-07-01 | Noche | 89.79277 | 12 | 12.000000 | 0.000000 | 0.000000 | 24.04666 | 18.160552 | 0 | 156 | 0.8461538 | 0.6802484 |
| 2017-07-02 | Dia | 87.95666 | 12 | 22.760833 | 0.341111 | 0.000000 | 19.84250 | 13.098888 | 0 | 156 | 0.7749875 | 0.7275276 |
| 2017-07-02 | Noche | 43.32527 | 12 | 8.532222 | 10.242222 | 0.969722 | 80.93055 | 0.000000 | 0 | 156 | 0.7965117 | 0.3486780 |
| 2017-07-03 | Dia | 88.46916 | 12 | 4.510833 | 3.401666 | 11.566944 | 20.58777 | 15.463608 | 0 | 156 | 0.7982087 | 0.7104784 |
In summary it turns out that is needed summarized how much time have each times taken group by date an shift of the all truck. Moreover, 3 variable are mutated to a dataframe, the first Tcalendario is the sum of;
\[preventiva + planeada + correctiva + overall +programada + noprogramada\]
disponibilidad that is:
\[\%disponibilidad = \frac{Tcalendario-preventiva-correctiva-planeada-overall}{Tcalendario}\]
As you can see, disponibilidad is a rate of percentange which assesses the equipment and maintenance group performance.
Other variable added is utilization that is
\[\%utilizacion =\frac{operativo}{Tcalendario-preventiva-planeada-correctiva-overall}\].
Like disponibilidad, utilizacion a rate of how effective the operation use the equipments for moving material over the time delived by maintenance.
cerro_matoso_equipo <- cerro_matoso_1%>%group_by(fecha,Equipo)%>%
summarize(
Operativo=sum(Operativo),
preventiva=sum(`Mant Preventiva`),
planeada=sum(`Mant Planeada`),
correctiva=sum(`Mant Correctiva`),
overall=sum(`Over Haul`),
programada=sum(`Demora Prog `),
noprogramada=sum(`Demora Noprog `),
stanby=sum(DemoraStandBy))%>%
mutate(
Tcalendario=(Operativo+preventiva+planeada+correctiva+overall+programada+noprogramada),
disponibilidad=(Tcalendario-preventiva-correctiva-planeada-overall)/Tcalendario,
utilizacion=Operativo/(Tcalendario-preventiva-correctiva-planeada-overall))
knitr::kable(cerro_matoso_equipo[1:5, ])
| fecha | Equipo | Operativo | preventiva | planeada | correctiva | overall | programada | noprogramada | stanby | Tcalendario | disponibilidad | utilizacion |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2017-07-01 | M3-32 | 0.000000 | 0 | 24.000000 | 0 | 0 | 0.000000 | 0.000000 | 0 | 24 | 0.0000000 | NaN |
| 2017-07-01 | M3-33 | 12.210555 | 0 | 0.000000 | 0 | 0 | 1.896943 | 9.892499 | 0 | 24 | 1.0000000 | 0.5087732 |
| 2017-07-01 | M3-34 | 9.428055 | 0 | 10.468888 | 0 | 0 | 3.814721 | 0.288333 | 0 | 24 | 0.5637963 | 0.6967688 |
| 2017-07-01 | M3-35 | 19.183054 | 0 | 0.000000 | 0 | 0 | 4.816944 | 0.000000 | 0 | 24 | 1.0000000 | 0.7992940 |
| 2017-07-01 | M3-37 | 17.124722 | 0 | 1.713055 | 0 | 0 | 4.168610 | 0.993611 | 0 | 24 | 0.9286227 | 0.7683746 |
| ###cerro_mato | so_gather | dataset. |
cerro_matoso_gather<-cerro_matoso_1%>%
group_by(fecha)%>%
summarize(
Operativo=sum(Operativo),
preventiva=sum(`Mant Preventiva`),
planeada=sum(`Mant Planeada`),
correctiva=sum(`Mant Correctiva`),
overall=sum(`Over Haul`),
programada=sum(`Demora Prog `),
noprogramada=sum(`Demora Noprog `),
stanby=sum(DemoraStandBy))%>%
mutate(
Tcalendario=(Operativo+preventiva+planeada+correctiva+overall+programada+noprogramada),
disponibilidad=(Tcalendario-preventiva-correctiva-planeada-overall)/Tcalendario,
utilizacion=Operativo/(Tcalendario-preventiva-correctiva-planeada-overall))%>%
select(fecha,disponibilidad, utilizacion) %>%gather(key = categoria,value=value,-fecha)
knitr::kable(cerro_matoso_gather[1:5, ])
| fecha | categoria | value |
|---|---|---|
| 2017-07-01 | disponibilidad | 0.8071091 |
| 2017-07-02 | disponibilidad | 0.7857496 |
| 2017-07-03 | disponibilidad | 0.8595192 |
| 2017-07-04 | disponibilidad | 0.8090705 |
| 2017-07-05 | disponibilidad | 0.8245157 |
Ones of the most important mine KPIs are disponibilidad and utilizacion. Both are often taken as part of bussiness cases in which are those features to being optimized. Nevertheless, previously, before this essay both were taken as individual variables without any correlation to each other. What it is false. The projects were bogus due to just make effort on reducing one of them without see the behavior of the another. what it leds to mistakes and non-appoined result. So it is neeeded whether a correlation exists. To see so, it may be carried out a correlation plot or a inferential analysis(F statistic, T student). However for being more sharp, we’d rather liked a Simple linear regression, where if the changes onn utilization are due to chance and there’s no relation with disponibilidad or there’s is relation.
We can frame this investigation into statistical hyphothesis test
H0: Slope=0, The true linear model has slope and there’s no relation between disponibilidad and utilization
HA: slope =!0, the true linear model has a slope different to zero, and utilization’s change is related to disponibilidad.
##----------lays out utilization vs disponibilidad coloured by shift.
##----------furthermore,method simple linear regresion line for more interpretability.
ggplot(cerro_matoso_group, aes(x=disponibilidad, y=utilizacion, col=Turno))+
geom_point()+geom_smooth(se = F,method = lm)
As it can seen graphically, there’s non-zero slope what it means the change on utilization can be explain with changes in disponibilidad, thereby aren’t separeted features. To do a stronger proof. Linear model summary can show a better approach.
lm(utilizacion~disponibilidad, data=cerro_matoso_group)%>%
summary()
##
## Call:
## lm(formula = utilizacion ~ disponibilidad, data = cerro_matoso_group)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.66726 -0.06965 0.04028 0.12112 0.24789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.15122 0.05301 21.717 < 2e-16 ***
## disponibilidad -0.55505 0.06850 -8.103 2.32e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1695 on 716 degrees of freedom
## Multiple R-squared: 0.08399, Adjusted R-squared: 0.08271
## F-statistic: 65.65 on 1 and 716 DF, p-value: 2.323e-15
Here can be seen:
The main reason whereby when disponibilidad increases and utilization decreses is because more equipment are available and the same people available,something like lack of staff.The opposite occurs when the disponibilidad decreases, the utilizacion increase unexpectedly.Then engineers can say that their outstandings managements over the fleets are giving results.
look at when the disponibilidad goes downward the utilizacion goes upward
The previous conclusion opens up widely window where we can see the correlation among the perfomance KPIs let’s do it!
Many hidden correlations suddenly pop up. Yes there’s negative correlation between utilizacion and disponibilidad, isn’t strong but it exists.Following conclusions can be taken from the plot
First of all, the correlation above is just a pearson correlation that doesn’t include categorical variable like Turno that can incoporate different aproaches. It may turn out that simpson’s paradox comes in and the same behavior doesn’t occur anymore.
A scatter plot can show why.
As it has been shown, simpson’s paradox displays other a clue about the effect standy is able to do when Turno pops up.
ggplot(cerro_matoso_group, aes(x=stanby, y=utilizacion,col=Turno))+
geom_point( )+geom_smooth(se = F,method = lm)
Look at this! it seems like utilizacion is influenced by Turno when it is night.So there might be increments in utilizacion due to standby. So the advice only apply at night, so engineer, let’s check closer why does utilizacion increase.
Once utilizacion has been checked, the original dataset give us more data and more analysis do with. So let’s see!
When we work with groups of data, we want to see a regular behavioir inside the group and between groups. In this case, trucks give operativo time for moving and dumbing raw material, and we need to figure out if each truck operativo time is within the variable’s natural variabiity.An ANOVA can asses whether there are different or not among the trucks and if so, which are below of natural performance and why?.
aov(Operativo~Equipo, data=cerro_matoso_equipo)%>%summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## Equipo 13 23408 1800.6 38.14 <2e-16 ***
## Residuals 4989 235548 47.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s check which trucks are below
d=pairwise.t.test(cerro_matoso_equipo$Operativo,cerro_matoso_equipo$Equipo, alternative = "two.sided")
d$p.value
## M3-32 M3-33 M3-34 M3-35 M3-36
## M3-33 9.567019e-02 NA NA NA NA
## M3-34 4.476077e-09 1.678001e-02 NA NA NA
## M3-35 5.497719e-03 6.165131e-10 5.012575e-23 NA NA
## M3-36 2.252164e-09 9.585101e-03 1.000000e+00 3.277560e-23 NA
## M3-37 3.928567e-07 1.582986e-01 1.000000e+00 6.299672e-20 1.000000e+00
## M3-38 9.421295e-05 6.744131e-13 2.599119e-27 1.000000e+00 1.919929e-27
## M3-39 2.056486e-04 2.440391e-12 1.611487e-26 1.000000e+00 1.173195e-26
## M3-40 3.975589e-02 1.000000e+00 4.365755e-02 8.196918e-11 2.636163e-02
## M3-41 4.942497e-10 4.891354e-03 1.000000e+00 1.605858e-24 1.000000e+00
## M3-42 8.761861e-04 1.000000e+00 6.604929e-01 3.853981e-14 4.411076e-01
## M3-43 8.074151e-07 2.271800e-01 1.000000e+00 2.119518e-19 1.000000e+00
## M3-44 1.000000e+00 1.108009e-01 6.303416e-09 4.600089e-03 3.171901e-09
## M3-45 1.230655e-08 2.884238e-02 1.000000e+00 2.479742e-22 1.000000e+00
## M3-37 M3-38 M3-39 M3-40 M3-41
## M3-33 NA NA NA NA NA
## M3-34 NA NA NA NA NA
## M3-35 NA NA NA NA NA
## M3-36 NA NA NA NA NA
## M3-37 NA NA NA NA NA
## M3-38 5.938097e-24 NA NA NA NA
## M3-39 3.280270e-23 1.000000e+00 NA NA NA
## M3-40 3.403895e-01 6.909860e-14 2.620584e-13 NA NA
## M3-41 1.000000e+00 6.050165e-29 4.046520e-28 0.01442720 NA
## M3-42 1.000000e+00 1.286961e-17 5.735932e-17 1.00000000 3.034371e-01
## M3-43 1.000000e+00 2.238479e-23 1.195735e-22 0.45370151 1.000000e+00
## M3-44 5.321288e-07 7.325583e-05 1.614330e-04 0.04560703 6.981636e-10
## M3-45 1.000000e+00 1.480952e-26 9.057542e-26 0.07146187 1.000000e+00
## M3-42 M3-43 M3-44
## M3-33 NA NA NA
## M3-34 NA NA NA
## M3-35 NA NA NA
## M3-36 NA NA NA
## M3-37 NA NA NA
## M3-38 NA NA NA
## M3-39 NA NA NA
## M3-40 NA NA NA
## M3-41 NA NA NA
## M3-42 NA NA NA
## M3-43 1.000000000 NA NA
## M3-44 0.001094003 1.085603e-06 NA
## M3-45 0.948545202 1.000000e+00 1.717715e-08
This is insane!, none has time to see this, this isn’t data analysis. Let’s see visually.
##---------------------------lays out a boxplot as a proof of different operativo's time.
g=ggplot(cerro_matoso_equipo,aes(x=Equipo,y=Operativo, col=Equipo))+geom_boxplot()+ylab('Operativo(hours)')
ggplotly(g)
The total time given by every truck within a day(24hours)
As you have noticed visually, there almost 4 truck that seem to be underneath of the median.